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FOREWORD 


The Software Engineering Laboratory (SEL) is an organization 
sponsored by the National Aeronautics and Space Administra- 
tion, Goddard Space Flight Center (NASA/GSFC) and created 
for the purpose of investigating the effectiveness of soft- 
ware engineering technologies when applied to the develop- 
ment of applications software. The SEL was created in 1977 
and has three primary organizational members: 

NASA/GSFC (Systems Development and Analysis Branch) 

The University of Maryland (Computer Sciences Department) 
Computer Sciences Corporation (Flight Systems Operation) 

The goals of the SEL are (1) to understand the software de- 
velopment process in the GSFC environment; (2) to measure 
the effect of various methodologies, tools, and models on 
this process; and (3) to identify and t J ’ to apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software En- 
gineering Laboratory Series, a continuing series of reports 
that includes this document. A version of this document was 
also issued as University of Maryland Technical Report 
TR-1236 . 

Contributors to this document are 

Victor Basili (University of Maryland) 

David Weiss (Naval Research Laboratory) 

Single copies of this document can be obtained by writing to 

Frank E. McGarry 
Code 582.1 
NASA/GSFC 

Greenbelt, Maryland 20771 
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ABSTRACT 


Ail effective data collection methodology for evaluating software 
development methodologies was applied to four different software de- 
velopment projects. Goals of the data collection included character- 
t ixlng changes and errors, characterising projects and programmers, 

identifying effective e.rror detection and correction techniques, and 
Investigating ripple effects. 

The data collected consisted of changes (including error corrections) 
made to the software after code was written and baselined, but before 
testing began. Data collection and validation wore concurrent with 
software development. Changes reported were verified by interviews with 
programmers. Analysis of the. data showed patterns that were used in 
satisfying the goals of the data collection. Some of the results are 
summarised in the following; 

i. Error corrections aside, the most frequent type of change was an 
unplanned desigu modification. 

1, The most common type of error was one made in the design or 
Implementation of a single component of the system. Incorrect requirements 
and misunderstandings of functional specifications, interfaces, support 
software and hardware, and languages and compilers were generally not 
significant sources of errors. 

3. Despite a significant number of requirements changes imposed on 
some projects, there was no corresponding increase in frequency of 
requ l cements misunderstandings . 

d. More than 753! of all changes took a day or less to make* 

5. Changes tended to be nanloeallr.ed with respect to individual 
components but localised with respect to subsystems. 

0 * Relatively few changes resulted in errors. Relatively few errors 
required more than one attempt at correction. 

i 

7. Most errors were detected by executing the program. The cause of 
most errors was found by reading code. Support facilities and techniques 
/ such as traces, dumps, cross-reference and attribute listings, and program 

proving wo re rarely used. 
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1. Introduction 

In previous and companion papers [1,3,3, 4] we have discussed how to 
obtain valid data that may be used to evaluate software development methodolo- 
gies in a production environment. Briefly, the methodology consists of the fol- 
lowing five elements. 

(1) Identify goals. The goals of the data collection efTort are defined before any 
data collection begins. We often relate thorn to how well the goals for a pro- 
duct or process are met. 

(2) Determine questions of interest from the goals From the goals, specific 
questions are derived. Answering the questions derived from each goal 
satisfies the goal. 

(3) Develop a data collection form. The data collection form used is tailored to 
the product or process being studied and to the questions of Interest. 

(4) Develop data collection procedures. Data collection is easiest when the 
data collection procedures are part of normal configuration control pro- 
cedures. 

(5) Validate and analyze the data, Reviews and analyses of the data are con- 
current. with software development. Validation includes examining com- 
pleted data collection forms for completeness and consistency. Where 
necessary, interviews with the person(s) supplying the data are conducted 

The purpose of this paper is to present the results from such an evaluation. 
The data presented here were collected as part of the studies conducted by 
NASA’s Software Engineering Laboratory [5] 

Overview of the Projects Studied 

The methodology described In [l] was used to study five projects in two 
different environments: a research group at the Naval Fes eat cb Laboratory 
(NRL), and a NASA software production environment at Goddard Space Flight 
Center (GSFC). The NRL studies have been previously presented [3,6,3,?] and 
will not be further discussed here. A brief description of the NASA projects fol- 
lows, 

The Software Engineering Laboratory 

The Software Engineering Laboratory (SEL) is a NASA sponsored project to 
investigate the software development process, baaed at Goddard Space Flight 
Center (GSFC). A number of different software development projects arc being 
studied as part of the SEL investigations [8,5]. Studies of changes made to the 
software as it is being developed constitute one part of those investigations. 
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Typical projects studied by the SSL are medium size FORTRAN programs 
that compute the orientation (known as attitude) of unmanned spacecraft, 
based on data obtained from on-board sensors. Attitude solutions are displayed 
to the user of the program Interactively on CRT terminals, because the basic 
functions of these attitude determination programs tend to change slowly with 
time, large amounts of design and sometimes code are often re-used from one 
program to the next, The programs range in size from about 20,000 to about 
120,000 lines of source code, They include subsystems to perform such func- 
tions as reading and decoding spacecraft telemetry data, Altering sensor data, 
computing attitude solutions based on the sensor data, and providing an 
(interactive) interface to the user. 

Development is done by contractor personnel in a "production” environ- 
ment, and is often separated into two distinct stages. The Arst stage is a high- 
level design stage, The system to be developed is organized into subsystems, 
and then further subdivided. Each subsystem generally performs a major sys- 
tem function, such as processing telemetry data, For the purposes of the SEL, 
each named entity in the system is called a component, The result of the Arst 
stage is a tree chart showing the functional structure of the subsystem, Ln some 
cases down to the subroutine level, a system functional specification describing, 
in English, the functions of the system, and decisions as to what software may be 
reused from other systems, 

The second stage consists of completing the development of the system. 
Different components are assigned to (teams of) programmers, who write, 
debug, test, and integrate the software, Before delivery, the software must pass 
a formal acceptance test, On some projects, programmers produce no inter- 
mediate specifications between the functional speciAcations produced as part of 
the Arst stage and the code, Some projects produce pseudo-code speciAcations 
for individual subroutines before coding them in FORTRAN. During the period of 
time that the SEL has been in existence, a structured FORTRAN preprocessor 
has come into general use. 

The principal design goal of the major SEL projects is to produce a working 
systen in time for a spacecraft launch, In addition, a continuing NASA goal is 
introducing improved techniques into its software development process. Results 
from SEL studies of three different NASA projects, denoted 3EL1, SEL2, and 
SEL3, are included here. 

2. Application Of The Experimental Procedure 

The goals, questions of interest, and data categorizations, as described in 
[1], for the SEL projects are shown in table 1 and lists 1 and 2, The SEL studies 
represent a full-scale implementation of the data collection methodology in a 
software production environment, Because the SEL environment is not pri- 
marily devoted to developing and proving new methodologies, the emphasis is 
more on investigating the software development environment than in a study 
such as [3], 

SEL Goals 

Since the primary emphasis in SEL projects is not on developing and prov- 
ing new methodologies, the data collection goals are generally methodology- 
independent, Nevertheless, many of the projects do use recently-developed 
software engineering technology with a view towards evaluating the technology 
in the N'ASA/GSF'C environment. (An example is program deiign language, used 
m several SEL projects.) As a result, the goal "evaluate effectiveness of metho- 
dologies" is used, but is not based on speciAc claims for specific methodologies. 
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1. Characterize changes (especially in ways that permit comparisons across 
projects and environments). 

2, Characterize errors (especially in ways that permit comparisons across 
projects and enviroiiments). 

3, Evaluate effectiveness of methodologies in NASA/GSFC environment, 

4. Suggest ways of improving NASA/GSFC software development practices. 

5, Verify that concurrent data validation is needed. 

6. Identify good measures of correctness. 

7. Identify effective techniques for detecting errors. 

8. Identify effective techniques for obtaining the information needed to 
correct errors. 

9, Investigate the "ripple" effect, i.e. do most errors require more than one at- 
tempt at correction or result in changes distributed over several different 
components of the system? 

10. Characterize projects. 

11, Characterize programmers. 

12, Find factors that have significant effects on types and distributions of er- 
rors. 

Table 1, Data Collection Goals for the SEL Projects 


1, What was the distribution of changes according to the rsason for the 
change and the effect of the change? Reasons were considered to be 
one of the following: 

a. a change in requirement s or specifications, 

b. change in design 

c. a change in hardware environment (e.g, a new piece of hardware 
added to the system to be used by the program) 

d. a change in software environment (e.g. a new version of the 
FORTRAN compiler), 

e. an optimization, 

f. other. 

Since a change to any of the items in the preceding list could affect oth- 
ers on the list, the set of items that could be affected by a change were 
as follows: 

a, requirements or specifications, 

b, design, 

c, the hardware environment, 

d, the software environment, 

e, optimization algorithms and their implementation. 


List 1. Questions of Interest 
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2a, What was the distribution of changes across system components? 

2b, For each change, how many components have to be examined in order 
to make the change? 

3. What was the distribution of time required to design changes? For error 
corrections, the time required to design the change was assumed to be 
the same as the time required to understand the error and propose a 
correction. 

4. What was the ratio of changes not made to correct an error to error 
corrections as a function of time during the development cycle? 

5. What was the distribution of errors according to the misunderstandings 
that caused them (and what was the ratio of non-clerical to clerical er- 
rors?) ? 

6. What was the distribution of effort required to correct errors? 

7. What was the distribution of effort to correct errors across misunder- 
standings causing errors? 

8. How many errors were the result of a software change or modification (a 
modification is a change made for some purpose other than correcting 
an error)? 

9. What was the distribution of errors acrous error detectio'n techniques? 

10. What was the distribution of errors across error correction techniques? 

11. What was the number of attempted error corrections pe” error? 

12. What was the distribution of error corrections across project phases? 

13. What was the ratio of errors to various measures often associated with 
with effort and productivity, These measures include 

a. number of developers 

b. number of lines of code 

c. number of machine instructions 

d. number of memory words 

e. number of person-hours 

f. number of work assignments. 

14. What was the distribution of errors per person according to the number 
of people involved? 

15. What was the number of errors for projects requiring memorv overlays 
competed to those not requiring overlays? 

16. What was the distribution of errors according to programmer? 

17. How often must reported change data be corrected as a result of the 
data validation process? 


List 1. Questions of Interest (continued) 
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SKI, Questions of Interest 

Since the software was produced in a production environment with 
stringent deadlines, it was desirable to minimize the overhead Involved in 
collecting and validating data. Because there were no design goals with 
respect to the use of particular methodologies, questions relating to the sue* 
cess of particular methodologies were generally not considered. 

SEL Data Categories 

Selection of the data categories was based on acquiring the data needed 
to answer the questions of Interest, on maintaining a reason \bly small set of 
subcategories for convenience in collecting and interpreting the data, and 
on subjective estimates of the uniformity of the data distribution across the 
subcategories. 

The "catch-all" category "other" has been inserted for all changes that 
will not fit one of the other categories. If the categories selected agree well 
with the actual change distribution across the subcategories, few errors will 
fall Into the other subcategory. (The reverse situation Is not necessarily a 
sign of a poorly designed categorization scheme; the "other" changes may 
provide the most insight into the development process.) 

Data Collection, Validation, and Analysis 

Formal procedures used for data collection and validation are described 
in [l], as is the data collection form. 

Answering Questions of Interest 

The questions of interest are answered by presenting arid analyzing the 
data distribution(s) associated with each question. Because of space limita- 
tions, answers to the individual questions, and most tables and histograms 
used In the data analysis have been included in the Appendix. 

Overview Of The Data 

Tables 2 and 3 contain, for quick reference, an overview of the data col- 
lected and a summary of Information about the projects. Tables 4 through 7 
contain values of parameters often thought to characterize software 
development projects. 

3. Interpretations 

The research methodology permits at least one quite straightforward 
way of interpreting the data; using the distributions to answer the questions 
of interest, thereby satisfying the goals of the study. One may &lso compare 
distributions across different projects, where appropriate, and look for com- 
mon characteristics. Both of these processes lead to new goals and ques- 
tions, some of which may be answerable with the available data, and some 
requiring new studies. Examples of both will be presented here. 

List 3 shows, for each goal, the corresponding questions of interest. 
Where the same question(s) are used to satisfy several goals, the goals are 
listed together 


G 


1 Effort to change tniboalegortes 

a one hour or less 
b one hour lo one 4av 
c one day to three hoys 
d more than three days 

2 Cause of change and effect of change Causes of changes were considered 
to be one of the following 

a a change m requirements or specifications, 
b a change m design, 
c a change m hardware environment, 
d i\ change m software environment, 

<1 an optimisation, 
f other 

Since a change to any of the items in the preceding list could ftffeel 
others on the list, the set of Items that con'd be affected by a change 
were as follows 

a requirements or specifications, 
b design 

c the hardware environment, 

d the software environment, 

e optimization algorithms and their implementation 

3 Component changes This categorization shows, for each component, 
the number of changes made to the component There is, 
accordingly, one subeaiegory for each component of the system A 
similar eateger'zaUon is used (or the number of times each 
component is examined, i e the number of changes that required 
examination of the component 

*l. Result of modification (for error corrections only) SubcaU gomes 

a Result of modification not lo correct an error, for errors resulting 
from a program change other than an error correction, 
b Result of error correction for errors resiling from a program 

change made to correct an error (whether a prior correction attempt 
for the same error or a correction for some other error), 
c Not the result of a modification, for errors that are unrelated 
to program changes 

b Time to isolate cause (for error corrections only) Subeategories 

a one hour or less 
b one hour to one day 
e more than one day 


List 2 Data Categories 
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6. Causative misunderstanding. Subcategories: 

a. misunderstanding of requirements 

b. misunderstanding of functional specifications 

c. misunderstanding of other documentation 

d. misunderstanding of design (excluding interface) 

This subcategory was deemed sufficiently interesting to be 
further subdivided into the following subcategories: 
misunderstanding of intended use of the erroneous 
segment/proc/module, misunderstanding of the value or structure 
of data, and other. 

e. misunderstanding of Interface 

f. misunderstanding of programming language, further subdivided Into 
syntax and semantics misunderstandings 

g. misunderstanding of hardware environment 

h. misunderstanding of software environment 

i. clerical error 

j. other 

7. Development phase when error occurred. Subcategories: 

a. requirements 

b. functional specifications 

c. design 

d. coding and test 

e. other 

f. can't tell, for situations where the person supplying the Information 
does not know the phase, 

B, Method of detection. Subcategories: 

a. test runs 

b. code reading by programmer 

c. code reading by other person 

d. reading documentation 

e. proof technique 

f. trace 

g. dump 

h. cross-reference 

i. attribute list 

j. special debug code 

k. error messages, further subdivided into general error messages, and 
project specific (i.e, coded especially for this project) error 
messages 

l. inspection of output 

m. other 


List 2. Data Categories (continued) 
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Number of Number of Number of 
Changes Modifications Errors 

Project 

SEL1 2B1 101 180 

SEL2 229 110 119 

SEL3 760 453 307 

Table 2, Overview of Data Collected 


Project 

Effort 

Number of 
Developers 

Lines of 
Code (K) 

Dev. Lines 
of Code (K) 

Number of 
Components 

SEL1 

79.0 

5 

50.9 

46,5 

502 

SEL2 

39.6 

4 

75,4 

31,1 

490 

SEL3 

98.7 

7 

85,4 

78.6 

639 


Table 3. Summary of Project Information 



Changes Per K Lines 

Errors Per K Lines 

Error To Hod Ratio 

Project 

Of Developed Code 

Of Developed Code 

(NonClericals Only) 

SEL1 

6.0 

3,9 

1.3 

SEL2 

7.4 

3.8 

,92 

SEL3 

9.7 

3.9 

,54 


Table 4. Change and Error Densities 
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Erroneous Change Rate 
(Ratio Of Changes 
Revolting In Errors 
To All Changes) 

Project 


Errors Resulting 
From Change 
(As Percentage 
Of NonClerlcals) 


Repeated Error Ratio 
(Average Number 
Of Corrections 
Per Error) 


SEL1 

.025 

5 

SEL2 

.061 

14 

SEL3 

.041 

12 


1.02 

1.08* 

1.05 






• Upper bound. Exact number of repeated errors for SEL2 is unknown, 
by conservative means, the ratio could be estimated as 1.04. 

Table 5, Measures of Erroneous Change 


Number Of People Errors Per Person 

Project 

SEL2 4 25 

SEL1 5 26 

SEL3 7 44 

Table 3. Errors Per Person By Number Of People 


Project 

Effort 

(People-Months) 

Errors Per 
Person-Month 

Changes Per 
Person-Month 

:3EL2 

39.6 

2.4 

.0 

SELl 

79.0 

1,7 

3.6 

SEL3 

98.7 

3,1 

7.7 


Table 7. Errors Per Effort By Effort 
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In the following sections each goal Is satisfied by presenting conclusions 
based on the answers to the questions corresponding to the goal, Sections con- 
taining discussions of goals are headed by short descriptions of goals. 
Identifiers in parentheses following the goal descriptions are references to tne 
goal, e.g (G2) is a reference to goal 2. Not all goals are discussed here. Goal 5, 
"verify that concurrent data validation is needed," is discussed in a companion 
paper [1]. 

Inspection of the change distributions shows that, despite the similarities in 
application, environment, and personnel, there are distinct differences among 
SEL projects, Some projects, notably SEL3, seem to have considerably less trou- 
ble in the development phase than others. 

There are two possible explanations; (1) the SEL3 developers did a better 
job in producing correct software, or (2) the SEL3 system was not subjected to a 
thorough inspection for errors, The latter explanation could be tested by 
analyzing the errors found in the projects during their use and maintenance, 
Attempting to satisfy this goal is beyond the scope of the research reported 
here. 

Goad; Characterize Modifications (Gl) 

All three projects operated in a stable environment, where there were few 
changes to the support software and hardware; none of them made many 
changes for the purpose of adding or deleting debug code. The results support 
the view that the SEL designers have organized their systems so that, for pur- 
poses of redevelopment, most changes are confined to a few subsystems. 

One way that the projects clearly differ is in their reasons for making un- 
planned design changes. Some spend a great deal of time on optimization and 
improving the services the system offered to its users, others on attempting to 
improve the clarity of the code and its documentation. It is interesting to note 
that SEL2 and SEL3, whose programmers had different reasons for making un- 
planned design modifications, had the same task leader and some of the same 
staff 

Coupled with the effort and the component-wise change analyses, these 
results suggest that most unplanned design modifications are smal' and only in- 
volve one component Of the system. Several explanations are possible; either 
the programmers act as "filters," rejecting unplanned modifications that are not 
easy to make, or reasons for modifying the design are not characteristic of the 
programmers, but rather of some external source. 

Some conclusions concerning characterization of modifications 

Although it is tempting to try to characterize a "typical" modification, there 
is too much variability in the sources of modifications for the different projects 
to do so safely. The sources for most modifications faLl into one of a small 
number of subcategories, such as requirements modifications, planned enhance- 
ments, improvements of clarity, improvements of user services, and optimiza- 
tions. The distributions over these categories distinguishes one oroject from 
another. 

The SEL projects are ail similar with respect to the effort required to modi- 
fy the programs; most changes and modifications take a day or less to make. 
Furthermore, although the changes tend to be nonlocalized with respect to indi- 
vidual components (most components that are changed are only changed once 
or twice), they are localized with respect to subsystem, i.e. the majority of 
changes are made in one or two subsystems. 


11 


Goal: 

Characterize changes, 

Questions: 

What v/as the distribution of modifications according to the reason for the 
modification? 

What was the distribution of changes across system components? 

What was the distribution of effort required to design changes? 

Goal: 

Characterize errors. 

Questions; 

What was the distribution of errors according to the misunderstandings that 
caused them? 

What was the distribution of effort required to correct errors? 

What was the distribution of effort to correct errors across misunderstand- 
ings causing errors? 

How many errors were the result of a software change? 

Goal: 

Characterize projects. 

Goal; 

Characterize programmers. 

Goal: 

Find factors that have significant effects on types and distributions of er- 
rors. 

Goal: 

Evaluate effectiveness of methodologies in NASA/GSFC environment. 

Goal: 

Suggest ways of improving NASA/GSFC software development practices, 
Questions: 

All questions are used in satisfying this goal. See list 1, 

Goal- 

Verify that concurrent data validation is needed, 

Question: 

How often must reported change data be corrected as a result of the data 
validation process? 


List 3. Relationship Between Goals and Questions 
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Goal 

identify good measures of correctness 
Questions* 

What was the distribution of efTort required to design changes? 

What was the ratio of changes not made to correct an error to error correc- 
tions as a function of time during the development cycle? 

What was the distribution of errors according to the misunderstandings that 
caused them? 

What was the distribution of efTort required to correct errors? 

What was the distribution of efTort to correct errors across misunderstand- 
ings causing errors? 

How many errors were the result of a software change? 

What was the distribution of errors across error detection techniques? 

What was the number of attempted error corrections per error? 

What was the ratio of errors to various measures often associated with 
efTort and productivity? 

What was the distribution of errors per person according to the number of 
people involved? 

What was the number of errors for projects requiring memory overlays 
compared to those not requiring overlays? 

What was the distribution of errors according to programmer? 

Goals: 

Identify effective techniques for detecting errors. 

Question: 

What was the distribution of errors across error detection techniques? 

Goal. 

Identify effective techniques for obtaining the information needed to 
correct errors. 

Question: 

What was the distribution of errors across error correction techniques? 

Goal: 

Investigate the "ripple" effect, i.o. do most errors require more than one at- 
tempt at correction or result in changes distributed ove~ several different 
components of the system? 

Question: 

What was the number of attempted error corrections per error? 

List 3. Relationship Between Goals and Questions (continued) 
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Goal: Characterize Errors (G2) 

From the answers to the questions we may conclude that the SEL program- 
mers tend to spend their time finding and correcting many "small" errors made 
while designing or implementing single routines, rather than struggling with a 
few "large" errors, or trying to understand requirements or interfaces, 

All the SEL projects handled changes with little trouble; relatively few er- 
rors were the result of a change to the software, The SEL developers apparently 
understand their requirements well enough that they can handle changes to 
them without much trouble, Interfaces, often considered to be a major source 
of errors, do not seem especially troublesome. There is some indication that the 
interface and requirements understandings that do occur are mo-*e difficult to 
correct than others, However, the small number of errors involved makes it 
dangerous to draw such a conclusion, 

We believe there are two factors that explain the shape of the error distri- 
butions and their similarity across projects. 

a. The SEL projects all have the same application. They are essentially 
redevelopments, each using the same overall design and often much of the 
same code as previous projects, Although new Individual programmers may 
be used from one project to the next, the same people do the top level 
design, Having found a successful design, they reuse it, 

b. The SEL projects used programmers who were familiar With the language 
they were using, and both were developed in a stable environment, i,e, 
there were few changes in support hardware or software, 

Some conclusions concerning error characterization 

Based on the foregoing analysis, one might characterize a "typical" error as 
one that occurs in the design or implementation of a single component, is easy 
to correct, and whose cause is easy to find, 

Goal: Evaluate Effectiveness Of Methdologies In NASA/GSFC Environment (G3) 

It was expected that vajrious software engineering techniques would be tried in 
the course of these studies. However, it was found to be extremely difficult to 
characterize the different techniques and the differences in the ways in which 
the techniques were applied for the SEL projects reported here. Consequently, 
this goal could not be satisfied. 

Goal: Suggest Ways Of Improving NASA/GSPC Software Development Practices 
(G4) 

Previous analyses have shown that the most abundant source of errors lies 
in the process of designing and implementing individual components of the SEL 
projects. Improvements should come from the introduction of any techniques 
that assist the individual programmer in preventing and detecting errors. A 
number of techniques and tools have been suggested to help in this process. A 
few are listed in the following. 

1. Program Design Language [9] 

2. Code Reading and Inspections [10] 

3. Program Proving [9, 11] 

4. Programming By Stepwise Refinement [12] 

5. Formal Specifications [13, 14] 

6. Information Hiding [15] 

7. Languages that provide strong typing, such as Pascal [16] 
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One would expect the Introduction of some or all of these and other, 
similar techniques to perturb the SEL environment initially. After the initial 
learning period, if such techniques meet the claims made for them, a shift in 
the error distributions could be expected. 

Goal: Identify Good Measure* Correctness (G6) 

In addition to various single parameters, one may also consider a 
number of different distributions as correctness measures. Candidates are 
the sources of nonclerical errors, the effort to design error corrections, the 
effort to isolate the error cause, the frequency distribution of error correc- 
tions, error corrections according to the subsystem in whnh they occur, and 
errors according Vo project phase, 

Several of the preceding distributions serve to locate the most trouble- 
some phases of the development process, and the most error-prone parts of 
the system. Others may be used as indicators of average difficulty in 
correcting errors, 

Some conclusions concerning measures of correctness 

It is not possible to identify from the data a single good parameter that 
can be used to measure correctness. Issues such as correctness relative to 
the amount of work that had to be done, or to the number of changes that 
had to be made, cannot easily be judged and cannot be discerned from a sin- 
gle parameter. Rather, a combination of parameters and distributions may 
be used to discover what and where difficulties were encountered in produc- 
ing a particular system, Attempting ' define the precise set of distributions 
and parameters to use is beyond the scope of this research. We do suggest 
that some of the following be used. 

a. Ratio of errors to modifications, to give an indication of how 
the developers were spending their time; 

b. Rate of erroneous changes, to give an indication of the 
difficulty the developers had in making changes; 

c. Sources of changes and sources of errors, to give an indication 
of the kinds of problems the developers had to handle, and the 
kinds of difficulties they had; 

d. Effort to make change, effort to isolate cause of error, and 
effort to design fix by source of error, to indicate difficulty 
of correcting errors; 

e. Phase of entry of errors into the system, to indicate whether 
certain aspects of the development caused trouble, or whether 
difficulties tended to be spread out over the entire development, 

Goal: Identify Effective Error Detection Techniques (G7) 

Executing the program was the most successful means for detecting er- 
rors. The distributions show what might be called a traditional approach to 
error detection: either test runs, or a programmer reading over her own 
code, 
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Goal: Identify Effective Error Correction Techniques (G8) 

It is clear from the data that the programmers favored code reading as 
an error correction technique. While this Is not surprising, the lack of use of 
other techniques is surprising. Although we cannot determine If program 
reading is popular because programmers are writing programs that are easy 
to read, we can say that improving the readability of programs should Im- 
prove the error correction process. 

Goal: Investigate The Ripple Effect (G9) 

There is nothing in the data to suggest a ripple effect of any 
significance. The lack of such an effect may be the result of the SEL experi- 
ence with the application. It may also be a result of monitoring the projects 
primarily through the development phase. Continued monitoring throughout 
the project lifetime might reveal such an effect as the software undergoes 
further change. 

Goal: Characterize Projects (G10) 

Examination of various parameters previously discussed shows that it is 
risky to characterize a project with a single parameter or distribution. 
Furthermore, it is difficult to predict the effect that a particular project 
characteristic will have on any particular change distribution, We can note 
variations in distributions that seem to distinguish some projects from oth- 
ers, and use the distinguishing distributions as the basis for more detailed 
experiments. 

The proposed distinguishing distributions are listed in the following, 

Change Distribution 

The distribution of changes across modifications and nonclerical er- 
rors clearly distinguishes SEL3 from the other SEL projects. 

Sources Of Modifications 

The sources of modifications distributions all show their strongest 
peaks in the same places, but have secondary peaks in different 
places. These secondary peaks may be used to distinguish among 
projects, SEL2 and SEL3 both show strong peaks i.i requirements 
changes, SEL1 and SEL3 both show peaks in the planned enhance- 
ment category. SEL1 has a much stronger peak in the design 
category than either of the others. 

Sources Of NonClerical Errors 

All projects show a strong peak in the same place in the sources of 
nonclerical errors distributions. SEL3 may be distinguished from 
the other SEL projects by its secondary peak in the "Design Multi- 
Comp" category, SEL1 shows a somewhat stronger peak in the "Fni 
Spec" category than the other projects, 

Effort To Design Change 

All SEL projects have design effort distributions of about the same 
shape, The only variation is in the proportion of the distribution 
contained in each category, SEL1 shows a considerably stronger 
peak in the Easy category than any of the other projects. 

Effort To Isolate Error Cause 

The distributions showing the effort to isolate error causes ap- 
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parcntly distinguish clearly between project SEL3 and the other SEL 
projects (Because of the relatively largo number of errors In the 
"Unknown" category In these distributions, the size oi the distinc- 
tion may not be as largo as it appears.) 

F’roquency Distribution Of Changes 

The SELl and SEL2 component change frequency distributions show 
a generally similar shape except for the first category. 

Characteristics Of The SEL Projects 

By analyzing the foregoing distributions, the SEL projects may be 
characterized as follows. 

1. Software production takes place in an environment stab’e with 
respect to hardware and software support. 

2. Programs are produced by making many small changes to a set of 
initial code. A significant number (40% or more) of these 
changes are error corrections. Most of the changes Eire not 
planned in advance. Relatively few of them result in errors. 

3. Most changes that are not error corrections are design changes 
made for the purposes of optimization, improving the clarity and 
maintainability of the code, improving the documentation 
(including comments in the code), or improving the services 
provided to the user by the program. 

4. Most errors occur in the design or implementation of one 
component of the system, and are easy to find and easy to 
correct. Errors are usually corrected on the first try. 

5. Although most changes are concentrated In two or three 
subsystems, few Individual components are changed more than 
three or four times. 

6. Although a project may have relatively many requirements 
changes, these changes do not constitute a major source of 
errors. Interface errors are also not especially troublesome, 

Goal: Characterize Programmers (Gil) 

Because there are few commonalities in the distributions oi' program- 
mer errors, there is little that can be said to characterize the programmers 
as a group Most have little trouble with the language or other attributes of 
the environment in which they program (e.g. the library system or the 
operating system). All of them seem to have the most problems in designing 
and implementing the internal structure of individual routines. 

Goal: Find Factors That Significantly Meet Distributions Of Errors (12) 

It is not possible in these studies to isolate particular factors and exam- 
ine their effect on the various error distributions. Nevertheless, it was ex- 
pected that patterns of influence would be visible. One expected pattern was 
that the distribution of sources of modifications would affect the distribution 
of sources of errors, e.g. the greater the number of requirements changes, 
the greater the number of requirements errors. Tins expectation was not 
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confirmed; the sources of errors seem to be relatively Independent of the 
sources of modifications, 

Other factors that were expected to contribute heavily as error sources, 
but apparently did not, include the software development environment, the 
programming language used, misunderstandings of interfaces, project size, 
and misunderstandings of specifications, 

The error distributions for the SEL projects indicate that the single 
most important factor is the method used by the individual programmer in 
designing and coding individual routines. More detailed studies of Individual 
programmer techniques in the SEL environment might indicate particular 
methodological weaknesses. 

Generalization of these results to other environments may not be possi- 
ble. 4n the SEL projects certain circumstances may have acted to decrease 
the effects of certain factors, SEL experience with the application, and the 
adaptation of previous designs in the development of new systems are in this 
category. 

4. Conclusions and Summary 

The SEL data collection projects showed that it was feasible to collect 
and validate data on all changes concurrently with software development. (A 
companion paper shows that it was necessary to perform validation by 
means of developer interviews.) The data collected permit the following 
characterization of the SEL environment, projects, and programmers. 

1. Error corrections aside, the most frequent type of change is an 
unplanned design modification. Such modifications are usually made 
for one of the following reasons; 

a. to optimize the program, 

b. to improve the services the program offers to its users, or 

c. to improve the clarity and maintainability of the program 
and its documentation. 

2. The most common type of error is one made in the design or 
implementation of a single component of the system. Incorrect 
requirements, and misunderstandings of functional specifications, 
interfaces, support software and hardware, and languages and 
compilers are generally not significant sources of errors, 

3. Despite a significant number of requirements changes imposed on 
some projects, there is no corresponding increase in frequency of 
requirements misunderstandings. A possible explanation is that the 
developers understand the application sufficiently well tnet their 
design is easily adaptable to most requirements changes, i.e they 
know what kinds of changes to expect and have designed for tnem. 

4. More than 75% of all changes take a day or less to correct. Most 
programmers apparently spend their time making many small changes 
to their programs, rather than few large ones. 

5. Changes tend to be nonlocalized with respect to individual components 
(most components that are changed are only changed once or twice), 
but localized with respect to subsystems (the majority of changes are 
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made In one or two subsystems), 

6. Relatively few changes result in errors. Relatively few errors 
require more than one attempt at correction, 

7. Most errors are detected by executing the program. The cause of 
most errors is found by reading code. Support facilities and 
techniques such as traces, dumps (which were once so popular that 

papers were published on how to read them e.g [17]. 

cross-reference and attribute listings, and program proving 
are rarely used. 

Opportunities Missed 

The data presented here and in [3,2,6] represent five years of data collec- 
tion. During that time there was considerable and continuing consideration 
given to the appropriate goals and questions of Interest, Nonetheless, as data 
were analyzed, it became clear that there was information that was never re- 
quested but that would have been useful. An example is the length of time each 
(<rror remained in the system. Programmers correcting their own errors, which 
was the usual case, could supply this data easily, One could then isolate errors 
that were not easily susceptible to detection by program execution or code 
reading. This example underscores the need for careful planning prior to the 
start of data collection. 

Comparing Environments 

In most sciences, valuable information is gained from repeating experi- 
ments, sometimes to confirm new results, other times to refine them. We be- 
lieve this should be the case in Computer Science. Although some interesting 
patterns are exhibited in the SEL data, it would be useful to seek similar trends 
in data from environments. Unfortunately, there exists little comparable data ( 
[4] is one exception), A primary reason for devising the data collection metho- 
dology used here is to show how comparable data from different environments 
may be collected. Common goals, questions of interest, and data categoriza- 
tions may be used to to ensure comparability. 
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Answering Questions of Interest 

The questions of interest are answered by presenting and analyzing the data 
distribution(s) associated with each question, For each question there is a short 
discussion of the associated distributions, The main purpose of the discussions 
is to point out various features of the distributions that are of significance In 
answering the questions. Table 8 shows the relation between the questions and 
the distributions. Not all questions are discussed here. Question 17, "Hew often 
must reported change data be corrected as a result of the data validation pro- 
cess?" is discussed In a companion paper [1]. 

For some questions either there were Insufficient data to answer the ques- 
tions, or the data were Judged Insufficiently reliable to produce meaningful dis- 
tributions, Interpretations of the questions as they relate to the goals of the 
studies are given in a later section. 

One purpose of this research Is to provide a set of empirically-derived data 
that others may use in constructing models and deriving hypotheses. The Fata 
presented here may be so used. Most of the presentations are in the form of his- 
tograms based on the data categorizations previously discussed. The following 
sections are Intended to help the reader understand the organization and con- 
tent of the various histograms and tables, 

Organization of Data Presentation 

In general, the histograms are organized Into figures, with each figure con- 
taining corresponding histograms for all projects. Examples are figure 1, which 
shows a broad view of all change data, and figure 3, which shown the sources of 
nonclerical errors for ail projects For some figures, not all projects are 
represented, since a particular set of data may not be relevant or available for 
some projects. 

Tables are used to show the relationship between two different categoriza- 
tions, such as effort to design modification according to source of modification 
(table 9). Labels on the histograms and tables are generally mnemonic abbrevi- 
ations of descriptions of data categories (e.g, PE means planned enhancement). 
Keys, supplied for non-obvious labels, provide the complete name for each 
mnemonic. 

Data Categorization 

During the data collection period, several improvements were made to the 
forms One result is that forms for some of the projects contain more 
categories than for others. A second result is that there are occasional 
differences in the names and meanings of similar subcategories for different 
projects within a particular figure. Such differences in categorization are dis- 
cussed In the next few sections. 

Changes In Measurement Precision 

Data categories for some of the projects contain finer data quantifications 
than others An example is the SELl and 5EL3 categories shown in figure 10, 
"Effort To Change NonClerica! Errors " The SEL3 figure has a larger set of 
categories than the SELl figure. After analyzing the results of our eariv data 
collection efforts, we realized it was possible to and of interest to use a finer 
measure of effort, 
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What was the distribution of modifications accord- 
ing to the reason for the modification? 

What was the distribution of changes across system 
components? 

What was the distribution of effort required to 
design changes? 

What was the ratio of changes not made to correct 
an error to error corrections as a function of time 
during the development cycle? 

What was the distribution of errors according to 
the misunderstandings that caused them? 

What was the distribution of effort required to 
correct errors? 

What was the distribution of effort to correct er- 
rors across misunderstandings causing errors? 

How many errors were the result of software 
changes? 

What was the distribution of errors across error 
detection techniques? 

What was the distribution of errors across error 
correction techniques? 

What was the number of attempted error correc* 
tlons per error? 

What was the distribution of error corrections 
across project phases? 

What was the ratio of errors to various measures 
often associated with effort and productivity? 

What was the distribution of errors per person ac- 
cording to the number of people Involved? 

What was the number of errors for projects requir- 
ing memory overlays compare r ’ to those not re- 
quiring overlays? 

What was the distribution of errors according to 
programmer? 

How often must reported change data be corrected 
as a result of the data validation process? 


Figures 3, 4 

Figures 14, 15 

Figures 0, 9, 10 

Data not sufficiently 
reliable to produce 
meaningful distribu- 
tion. 

Figures 5 t 6, 7 

Figures 10, 11, 12, 
' 13 

Tables 11, 12, 13, 14, 
15, 16 
Table 5 

Tables 17, 10, 19 
Tables 20, 21, 22 
Table 5 
Figure 10 
Tables 4, 5, 6, 7 
Table 6 

Insufficient data for 
meaningful results. 

Figure 19 

Presented elsewhere 


Table 0, Figures /Tables used in Answering Questions 
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Organization of Data Presentation 

In general, the histograms are organized into figures, with each figure con- 
taining corresponding histograms for all projects, Examples are figure 1, which 
shows a broad view of ail change data, and figure 3, which shows the sources of 
nonclerical errors for all projects. For some figures, not ail projects are 
represented, since a particular set of data may not be relevant rr available for 
some projocts. 

Tables are used to show the relationship between two different categoriza- 
tions, such as effort to design modification according to source of modification 
(table 9). Labels on the histograms and tables are generally mnemonic abbrevi- 
ations of descriptions of data categories (e.g. PE means planned enhancement), 
Keys, supplied for non-obvious labels, provide the complete name for each 
mnemonic. 

Data Categorization 

During the data collection period, several improvements were made to the 
forms. One result is that forms for some of the projects contain more 
categories than for others. A second result is that there are occasional 
differences in the names and meanings of similar subcategories for different 
projects within a particular figure, Such differences in categorization are dis- 
cussed in the next few sections. 

Changes In Measurement Precision 

Data categories for some of the projects contain finer data quantifications 
than others, An example is the SEL1 and SEL3 categories shown in figure 10, 
"Effort To Change NonCierical Errors." The SEL3 figure has a larger set of 
categories than the SEL1 figure. After analyzing the results of our early data 
collection efforts, we realized it was possible to and of interest to use a finer 
measure of effort. 

Insufficient Subcategorization 

As a result of inexperience, some data categories were too broad, and some 
too narrow on the early versions of the data collection forms, As an example, a 
design change category was included on the form at one time. So many changes 
were reposed in this category that it was important to subcategorize further 
(The next version of the form contained the new subcategories explicitly), Fig- 
ure 3 shows the subcategories for all SEL projects, Conversely, environment 
changes occurred sufficiently rarely so that it was unnecessary to distinguish 
between hardware and software environment changes, These categories were 
merged during data analysis, 

TTie ’‘Unknown" Category 

Despite the intensive review and interview process used for validation, there 
were still cases where it was not possible to categorize certain changes. This 
occurred most often for the various effort categories when forms were gen- 
erated, These cases are categorized as unknown in the histograms where they 
appear. 

Fine Distinctions That Can Be Made 

For much of the data, the variety of data categorizations, the comments 
supplied by the programmers, and the information gained from validation per- 
mit certain fine distinctions to be drawn during analysis. An example is the dis- 
tinction among errors affecting more than one component, design errors 
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involving several components, and interface errors 

Interface errors may be divided into 2 classes The first class consists of 
Incorrect assumptions between modules and routines An example involved an 
assumption about initialization The programmer of one module assumed tha' it 
was necessary to invoke an initialization routine from a second module each 
time he used certain routines from the second modulo. This assumption was 
incorrect. The second class consists of errors in using interfaces, whore such 
errors are not the result of incorrect assumptions. An example is a program* 
mar forgetting to include a parameter in a calling sequence. 

Design errors involving several components are errors In the organization of 
the software Into components, including the specifications that describe that 
organization. Although this category includes many interface errors, it also 
Includes errors that are not Interface errors. 

Errors affecting more than one component are errors whoso corrections 
require changes to bo made in more than one component. These errors may (It 
any of the categories of misunderstandings and are not necessarily interface 
errors. 


DisUncUonn That Were Too Fine 

For some categories, developers wore asked to make finu distinctions in 
supplying the data The metric used for measuring difficulty of fixing non cleri- 
cal errors (see figure 10) is an example For SELi and 3EL2, programmers were 
asked to separate the effort Just to design the change from the effort to make 
the change This distinction was too fine for the programmers reporting the 
effort, and during S13L3 data collection just the total effort was requested. 

Comparing Distributions - Arithmetic Considerations 

To convert raw data counts into measures that could be used tc compare 
projects, percentage of changes in a particular category is usually used. As an 
example, in flguro 5, values in the distributions are shown as percentages of 
nonclerical errors. Because there are generally large differences in values 
within any distribution, the values are rounded to whole percents. For each dis- 
tribution, any category that is nonempty Is assigned a nonzero value. As a 
result, some categories that contain less than .51” of the distribution are shown 
as containing 1%. (Categories that contain no data do not appear In the distribu- 
tions.) For no distribution does this make a difference of more than ire in any 
category. For some distributions, there is a resulting round-off error. 

Answers To The Questions 

In the following sections wo discuss the answers to the questions of interest. 
For some questions, the data are not sufficiently complete or accurate to pro- 
vide meaningful or reliable answers. The reasons for this have been discussed in 
previous sections; where necessary, they are elaborated. Sections are headed 
by short descriptions of questions, identifiers in parentheses following the ques- 
tion descriptions are references to the question number, e g. (Q2) is a reference 
to question 2. 

Overview Or SKI, Changes 

There is no question that deals with alt changes; modifications and errors 
are characterized separately. Nevertheless, analysis of the data showed that \i 
was of interest to look at the overall change distributions and compare them 
across projects. 
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Figures 1 and 2 show some interesting differences among the three pro- 
jects The proportion of both all orrors and of nonclerical errors declines from 
SEL l (64% and 47% respectively) through SEL3 (40% and 32% respectively). The 
SBL3 developers also appear to have been considerably more occupied with 
making modifications than with correcting nonclorical errors. Various parame- 
ters that normalize number of changes and errors with respect to size m terms 
of effort and lines of code show the same trend. From those distributions and 
parameters It appears that there ara distinct differences among SEL projects, 
and that some projects seem to have considerably loss trouble in the develop- 
ment phase than others 

What was the distribution of modifications according to the reason for the 
modification? (Ql) 

Modification distributions are shown in figure 3. All projects show a strong 
spike in the design change subcategory. Thore is considerable variability in. 
several other categories. SEL2 and SEL3 both experienced relatively large 
numbers of requirements changes. SEL1 and SEES both show considerable use 
of planned enhancements. 

Similarities in the distributions show that all three projects operated In a 
stable environment, where there wore few changes to the support software and 
hardware, and that none of them made many changes for the purpose of adding 
or deleting debug code. 

Figure 4 Is an analysis of design modifications only. Again, there Is consid- 
erable variability in the distributions. SELi programmers were considerably 
concerned with optimization, i,e. Improving the efficiency of use of memory and 
processor time, and improving the services the system offered to its users 

The SEL2 distribution, whose pattern is somewhat less clear because of the 
large size of the "unknown" category, also shows emphasis on optimization, and, 
to a considerably lesser degree, on Improving user services and the clarity and 
maintainability of the program and its documentation. In SEL3, the emphasis is 
reversed, there wore relatively few attempts at optimization, but many at 
improving clarity, maintainability, and documentation. It is Interesting to note 
that SEl.,3 had tho same task leader and some of the same staff as SEL2. 

What was the distribution of changes across system components? (Q2) 

In other discussions of changes, we view a change as a logical unit, indepen- 
dent of how much code or documentation, or how many components were 
involved. For purposes of analyzing frequency distributions of changes, we con- 
sider the number of changes made to each component, The number of changes 
made to a component is considered to be the number of change report forms on 
which that component is named as being changed. Using this . eflmlion of 
change, figure 14 shows tho percentage of components that were changed once, 
twice, etc. As an example, for SELl, 29% of the components were changed onca, 
and 30% were changed twice. 

The frequency distributions for all tho SEL projects show the same pattern 
50% or more of the components that were changed were only changed once or 
twice, and more than 90% were changed 6 times or less The pattern is even 
more pronounced for fixes (figure 15): 70% or more of the Axed components 
wore only Axed once or twice. 

Figure 16 shows the patterns of subsystems that are changed and Axed 
most often (The distributions are obtained by grouping the data for the com- 
ponents into subsystems ) It is clear from these distributions that at most 2 or 
3 of the subsystems receive the most attention 
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What was the distribution of effort required to design changes? (Q3) 

Change effort distributions are shown in figures 0 through 13. Examining 
figure 0, which shows the effort for all changes except clerical errors, one can 
see that most (more than 7 51* of) changes fall into the easy or medium 
categories for all SEL projects. Figure 9, which is restricted to modifications 
only, shows a similar, but not as strong, trend. The trend Is most pronounced 
for nonelorical errors, 

What was the distribution of errors according to the misunderstandings that 
caused them? (Q5) 

Inspection of the distributions showing sources of nonclerical errors (figure 
5) shows noteworthy similarities across projects, The distributions all show 
strong spikes in the same places; It is evident that the major source of errors Is 
in the design and implementation of single components. 

Factors such as misunderstandings of requirements and specifications are 
minor sources of errors. (Note that figure 3 shows significant numbers of 
requirements changes for projects SEL13 and SEL3. The SEL developers 
apparently understand their requirements well enough that they can handle 
changes to them without much trouble ) Interfaces are also a minor error 
source (figure 7) 

Further analysis of the errors committed In design and implementation of 
components is shown In figure 6. In the SEL environment, data errors (errors m 
the value or structure of data) are either about evenly balanced with or predom* 
mate errors m the Intended use of compnonots. 

What was the distribution of cfTort required to correct errors? (Q6) 

Effort distributions for correcting errors are shown In figure 10 (Note that 
there is a slight difference in the type of effort measured for SELQ than for SEL1 
and SEL2 ) As shown by these distributions, most error corrections take little 
effort For all projects, approximately 5Q’A or more of the errors wore corrected 
in one hour or less, and more than 85!« were corrected in one day or less 

As might be expected, the distributions for effort expended in finding error 
causes (figures H, 12, and 13) follow a similar pattern. From these results we 
may conclude that the programmers tend to spend their time finding and 
correcting many "small" errors rather than few "largo" errors. 

What was the distribution of effort to correct errors across misunderstandings 
causing errors? (Q7) 

Tables 1 i through 16 support the view of most errors as being easy to find 
and fix and as occurring in component design or implementation. Very few 
errors take more than a day of effort to fix. Although interface errors are often 
cited as being particularly difficult to correct, table 13 shows that they follow 
the same pattern as other subcategorlos of errors. 

The only deviation mom the pattern appears to occur In the effort to fix 
requirements and specif, ration errors, where the distribution between easy and 
medium rated errors is more balanced than for the other subcategories, These 
results suggest that requirements and specification errors are more difficult to 
correct than others. However, the small number of errors m these sub- 
categories makes it. dangerous to draw such a conclusion 
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How many errors were the result of a software change? (Q0) 

Table 5 shows that the SEL projects handled changes with little trouble; 
relatively few errors were the result of a change to the software. 

What was the distribution of errors across error detection techniques? (Q9) 

The relative frequency of use of various error detection techniques are 
shown in tables 1? through 19 for the SEL projects. While examining the distri- 
butions, one must recall that SEL change monitoring did not begin until code 
was baselined and had already undergone debugging. Otherwise, error messages 
might rank higher as a detection technique. 

Executing the program was the most successful means for detecting errors. 
The distributions show what might be called a traditional approach to error 
detection: either test runs, or a programmer reading over her own code, 

What was the distribution of errors across error correction techniques? (Q10) 

The relative frequency of use of various error correction techniques are 
shown in tables 20 through 22. While it is not surprising that code reading by 
the programmer dominates all other methods, the relative infrequency of tech- 
niques such as traces, special debug code, test runs, and reading documentation 
is somewhat surprising. Dumps, which were once so popular that papers were 
published on how to read them (e.g. [17]), were rarely used. 

What was the number of attempted error corrections per error? (Qll) 

If any of the projects suffers from a ripple effect, we expect to see many 
errors requiring repeated attempts at correction, and many changes each 
resulting in several errors. As can be seen from table 5, both of these effects 
appear quite small. The worst case is about 6% of the changes resulting in 
errors (SEL2), The errors resulting from change for the worst case (SEL2) 
comprised 14% of all errors. Finally, very few errors required more than one 
attempt, at correction (these are a subset of the errors resulting from change, 
since each attempted correction is considered to be a change). 

What was the distribution of error corrections across project phases? (Q12) 

The distributions of errors according to the phase of the project in which 
the error entered the system are shown in figure 18, All projects show a strong 
spike in the code and test phase. These distributions are somewhat less reliable 
than others because programmers could not always decide exactly when a par- 
ticular error occurred. The unknown subcategory comprises such cases. 

What was the ratio of errors to various measures often associated with effort 
and productivity? (Q13) 

What was the distribution of errors per person according to the number of peo- 
ple involved? (Q14) 

Because of their similarity, questions 13 and 14 are answered together, 

Tables 4 through 7 show a variety of ways of normalizing error rates to pro- 
ductivity measures. Each normalization may be used to rank the projects. For 
the six different normalizations there Eire six different rankings, 

What was the distribution of errors according to programmer? (Q16) 

Distributions of errors for individual programmers are shown in figure 9 As 
with the project error distributions (e.g. figure 5), the individual programmer 
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error distributions all show peaks in the "Design Single Comp" category Both 
the relative size of this peak and the variation over the remainder of the distri- 
bution is considerably more variable among the different programmers than 
among the different projects. 
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Table 9, Effort To Modify By Source of Mod 
(As Percentage of Total Mods) 


Design Modifications caused by changes in design 
Debug Modifications to insert or delete debug code 

Env Modifications caused by changes in the hardware or software environment 

PE Planned Enhancements 

Req Modifications caused by changes in requirements of functional specifications 

Unknown Causes of these modifications are not known 
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Table 10. Effort to Modify By Source of Mod (Design Mods Only) 
(As Percentage of Total Mods) 


Key 

Clarity Improvement of clarity, maintainability, or documentation 
Opt Optimization of time -'space /accuracy 

Unknown Causes of these design changes are not knov n 
Improvement of user services 


US 
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Table 11, Effort To Design Fix By Source Of Error 
(As Percentage of NonClerical Errors) 
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Key 

Design error Involving several components 
Error in the design or implementation of a single component 
Misunderstanding of external environment, except language 
Functional specifications Incorrect or misinterpreted 
Error in use of programming language/compiler 
Requirements incorrect or misinterpreted 
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Table 12. Effort To Design Fix By Source Of Error (Design Errors Only) 
(As Percentage Of NonClerical Errors) 


Key 

Data Error in the use of data 

Intended Use Error In Intended function, 
i.e, program behavior does 
not correspond to the in- 
tended use of the program 
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Table 13. Effort To Design Fix For Interface Errors 
(As Percent Of NonClcrlcal Errors) 


iWfilxiii'liMliiij 




ORIGINAL PAGE 13 
A-15 OF POOR QUALITY 



Easy 

Medium 

Hard 

NA 

Unknown 


LE 1 HR 

1 Hr To 1 Day 

GT 1 Day 



Req 

1 

1 




Fnl Spec 

2 

4 


5 

3 

Design 

2 

3 



2 

Multi-Comp 

Destgn/Impl 

31 

26 

2 

2 

5 

Single Comp 






Lang /Compiler 
Env 

1 



1 

1 

Other 

1 

SEL1 


7 




Easy 

Medium 

Hard 

NA 

Unknown 


LE 1 HR 

1 Hr To 1 Day 

GT 1 Day 



Req 

5 



1 

1 

Fnl Spec 


1 

1 

1 

Design 

1 

2 



Multi-Comp 

Design/lmpl 

27 

32 

1 

4 

12 

Single Comp 
Lang /Compiler 

3 

2 

1 

1 

2 

Env 

Other 

1 

SEL2 


1 

2 




Easy 

Medium 

Hard 

NA Unknown 


LE 1 HR 

1 Hr To 1 Day 

GT 1 Day 


Req 

2 

3 

1 

1 

Fnl Spec 

2 

2 


1 

Design 

13 

8 

1 

4 

Multi-Comp 

Design/lmpl 

35 

17 

1 

4 

Single Comp 
Lang /Compiler 

2 

2 


1 

Env 

1 

1 


Other 

1 

SEL3 



Table 
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Isolate Cause By Source Of 

Error 


(As Percentage Of NonClerical Errors) 
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Design Multi-Comp Design error Involving several components 

Design/lmpl Single Comp Error In the design or implementation of a single component 
Env Misunderstanding of external environment, except language 

Fni Spec Functional specifications Incorrect or misinterpreted 

Lang Error in use of programming language/compiler 

Req Requirements Incorrect or misinterpreted 
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Table 15. Effort To Isolate Cause By Source Of Error (Design Errors Only) 
(As Percentage Of NonClerical Errors) 


Key 

Data Error in the use of data 

Intended Use Error in intended function, 
i.e. program behavior does 
not correspond to the in- 
tended use of the program 
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Table 16, Effort To Isolate Cause For Interface Errors 
(As Percent Of NonClerlcal Errors) 
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Table 17 SEL.i Frequency Of Use Of Error Detection Techniques 
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Activities Used 
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Table 19, SEL3 Frequency Of Use Of Error Detection Techniques 
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Table 22. SEL3 Frequency Of Use Of Error Correction Techniques 
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